Search CORE

8 research outputs found

Vid2speech: Speech Reconstruction from Silent Video

Author: Ephrat Ariel
Peleg Shmuel
Publication venue
Publication date: 09/01/2017
Field of study

Speechreading is a notoriously difficult task for humans to perform. In this paper we present an end-to-end model based on a convolutional neural network (CNN) for generating an intelligible acoustic speech signal from silent video frames of a speaking person. The proposed CNN generates sound features for each frame based on its neighboring frames. Waveforms are then synthesized from the learned speech features to produce intelligible speech. We show that by leveraging the automatic feature learning capabilities of a CNN, we can obtain state-of-the-art word intelligibility on the GRID dataset, and show promising results for learning out-of-vocabulary (OOV) words.Comment: Accepted for publication at ICASSP 201

arXiv.org e-Print Archive

Crossref

Seeing Through Noise: Visually Driven Speaker Separation and Enhancement

Author: Ephrat Ariel
Gabbay Aviv
Halperin Tavi
Peleg Shmuel
Publication venue
Publication date: 09/02/2018
Field of study

Isolating the voice of a specific person while filtering out other voices or background noises is challenging when video is shot in noisy environments. We propose audio-visual methods to isolate the voice of a single speaker and eliminate unrelated sounds. First, face motions captured in the video are used to estimate the speaker's voice, by passing the silent video frames through a video-to-speech neural network-based model. Then the speech predictions are applied as a filter on the noisy input audio. This approach avoids using mixtures of sounds in the learning process, as the number of such possible mixtures is huge, and would inevitably bias the trained model. We evaluate our method on two audio-visual datasets, GRID and TCD-TIMIT, and show that our method attains significant SDR and PESQ improvements over the raw video-to-speech predictions, and a well-known audio-only method.Comment: Supplementary video: https://www.youtube.com/watch?v=qmsyj7vAzo

arXiv.org e-Print Archive

Crossref

Patient-specific and global convolutional neural networks for robust automatic liver tumor delineation in follow-up CT studies

Author: AB Miller
Ariel Ephrat
C Coghlin
D Wong
E Eisenhauer
EL Chen
H Greenspan
J Zhou
Jacob Sosna
JS Hong
K Mala
Leo Joskowicz
M Bilello
M Freiman
M Freiman
MS Hassouna
Naama Lev-Cohain
Refael Vivanti
RL Lewis
S Klein
W Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref